SLAM glossary

This glossary of SLAM related terms is intended to help demystify some of the language that is commonly used, and explain the concepts in a simplified way to a newcomer. (It is assumed that the reader has at least a basic understanding of linear algebra and statistics, though explanations try to be accessible even without this knowledge.)

Bag-of-Words (BoW)

A method for building a frequency distribution of words in a document. The resulting distribution can be used to classify the document, usually based on what the most common words are.

Although originally designed for readable text documents, the concept of a “word” and a “document” can be translated to other domains. In computer vision, an image may be treated as a document, and descriptors of the features found in the image may be treated as words. The bag-of-words approach may then be used as a way to identify images whose features have similar distributions.

Bundle (Geometry)

A group of geometric items which share a common property, eg. passing through the same point.

Bundle Adjustment

The process of refining estimates of where 3D points lie in the surrounding environment. Estimates of the camera’s movement and optical distortion may also be refined as part of the process. “Bundle” in this name refers to geometric bundles of light rays, whose common property is that they pass through the optical centre of the camera.

In principle, a sensor such as a stereo camera can allow a program to estimate the 3D location of a feature that has been identified. However, if more information is available (for example, many stereo images of the feature, taken from different viewpoints), this information can be combined to improve the accuracy of the estimate.

Given a 3D estimate of a feature’s location, and a mathematical model of the camera, the accuracy of the estimate can be measured by projecting the 3D point back onto the camera’s mathematical image plane. The co-ordinates of this point on the plane are compared to the co-ordinates of the actual feature in the original image. The more accurate the 3D estimate, the smaller the distance on the plane between its projected point and the image feature point. This distance is known as the reprojection error.

Bundle adjustment attempts to minimise the reprojection error for every image feature, across all images in which they are seen. This equates to solving a least squares optimisation problem, where the 3D locations of the points, along with the camera parameters, are refined to minimise the total reprojection error across all the points.

Calibration (Of Camera)

The process of collecting sensor data from a camera, and establishing a mathematical model of the properties of that camera.

The collected data includes image data, and possibly also IMU data if an IMU is built into the camera. The resulting mathematical model of the camera can include both intrinsic and extrinsic parameters. Applications such as Kalibr can determine these parameters given a carefully captured video sequence.

Note that the strict definition of “calibration” means only to compare measurements of a device against a known reference. However, the common usage of the word often includes the process of correcting the device’s measurement error. The computed intrinsics and extrinsics can be used for this purpose.

Computing the camera intrinsics is closely related to the process of camera resectioning.

Covariance

A measure of the joint variability of two random variables. This means that if the covariance is positive, greater values in one of the variables are correlated with greater values in the other. If the covariance is negative, greater values in one of the variables are correlated with lesser values in the other. A covariance of around zero implies no correlation between the values of the variables.

The correlation coefficient also exists, which is essentially a normalised form of the covariance. This is 1 if there is a perfect positive linear correlation between the two variables, and -1 if there is a perfect negative linear correlation.

Degrees of Freedom (DoF)

The number of parameters in which a system may vary independently.

In Slamcore, the SLAM agent is often described as having six degrees of freedom – these are translation in X, Y and Z, and rotation around X, Y and Z.

Descriptor (Of Feature)

An encoding of information about an image feature.

Descriptors can be used to refer to features identified in images. This is essential for processes such as image registration, where features are used to identify common objects across different images. A descriptor is generated for each feature, and descriptors that are sufficiently similar across the different images are likely to identify the same object.

Drift (Of Trajectory)

The error in the localisation of the SLAM agent with respect to its environment, when accumulated over time.

A SLAM agent will almost always have uncertainty inherent in its movement commands (as well as in its sensing of its surrounding environment). As the agent continues to move through the environment, this causes the error between its estimated position and its actual position to compound. As a consequence, the accuracy of the agent’s estimated location appears to get worse the longer it runs for.

This drift can be corrected by detecting loop closures. A loop closure helps the SLAM agent re-localise itself against an area it has already seen in the past, which provides an “anchoring” effect. The process of pose graph optimisation propagates this correction to the agent’s entire trajectory.

Extrinsics (Parameters Of Camera)

The parameters of a camera that relate to its position and orientation in its environment. These are usually separated from the intrinsic parameters.

Feature (Of Image)

A part of an image that is considered interesting or useful. In SLAM, image features often refer to visual structures that are distinctive. These features have a good likelihood of being identified consistently and unambiguously, and therefore can be matched between different images.

Fiducial (Marker)

A marker that serves as a visual point of reference in an environment. Fiducial markers can act as visual anchors for location, orientation and scale, and can also distinguish between different locations in an environment if each marker is uniquely identifiable.

At Slamcore, the SLAM system uses fiducial markers to provide the SLAM agent with known points of reference about its environment. This can aid loop closures, as if a fiducial is detected, the agent can be highly confident of its location in the environment.

Frame (Image)

A visual sensor reading, as opposed to data that comes from an IMU or a wheel encoder.

Slamcore often equates frames and poses, since visual data is required for the Slamcore system to take a meaningful observation of its environment, and therefore to produce a pose.

Frame is an overloaded term! See also: Frame (Of Reference) and Frame (Iteration)

Frame (Of Reference)

A particular co-ordinate system in the physical world. Can be used interchangeably with “reference frame”.

Co-ordinates exist with respect to a frame of reference, and can be transformed between different frames of reference. Some common frames of reference used by Slamcore are:

  • World: Co-ordinates in the map of the environment
  • Fiducial-world: Co-ordinates with respect to the placed fiducials
  • Sensor: Co-ordinates with respect to a particular sensor, eg. a camera

Frame is an overloaded term! See also: Frame (Image) and Frame (Iteration).

Hessian (Matrix)

A matrix of a function’s second-order partial derivatives. The function must take one or more input variables, and output a single variable.

In (relative) layman’s terms, each cell in the matrix describes the curvature of the function with respect to some of its variables. For the details of the underlying mathematics, see the Wikipedia page’s definition of a Hessian matrix.

Hessian matrices are often used to determine the maxima and minima of a function .

Inertial Measurement Unit (IMU)

A sensor which reports measurements of forces acting upon it. IMUs often include an accelerometer to measure 3D accelerations, and a gyroscope to measure 3D rotations. Some IMUs also use magnetometers to measure magnetic fields, eg. to determine the direction of magnetic north.

In a robot, an IMU can be used to infer the robot’s orientation via the gyroscope, or to infer its acceleration or deceleration via the accelerometer. The latter can also be used to estimate the robot’s velocity, though this can be unreliable as it requires integrating the measurements of acceleration over time, leading to accumulation of error.

Intrinsics (Parameters Of Camera)

The parameters of a camera that relate to how the camera captures images. These include aspects like the focal length, the aperture, and the lens distortion. These are usually separated from the extrinsic parameters.

Jacobian (Matrix)

A matrix of a function’s first-order partial derivatives.

In (relative) layman’s terms, each cell in the matrix describes the gradient of part of the function, with respect to one of the function’s input variables. For the details of the underlying mathematics, see the Wikipedia page’s definition of a Jacobian matrix.This page contains more information about when and how a Jacobian matrix may be useful in machine learning

Landmark

A recognisable feature that stands out from its environment.

In SLAM, the word “landmark” is often used to refer to an image feature which represents something notable in the surrounding environment, and which can be used to relate separate images of the same area, for example in image registration.

Because of this correspondence, image landmarks and physical landmarks can be considered somewhat interchangeable: the word “landmark” can denote an image feature, or can denote the physical structure that the feature identifies.

Least Squares (Method)

A method of choosing an optimal function to describe a set of data, by minimising the square of the errors between the function and the data.

The optimal function is the one that produces the least error between its values and the data points in the set. As part of the optimisation, the parameters of the function are refined to find those that produce the least overall error.

In this case, the error is measured as the squared difference between a data point and the relevant function value, hence least squares.

Localisation

The process of computing one’s location within an environment, usually with respect to a map of the environment. This may be a pre-existing static map, or it may be learned and refined as the subject explores its environment, as is the case in SLAM.

Loop Closure

The detection of a previously visited area of the environment, which can be used to update a SLAM agent’s map and improve overall localisation accuracy.

When a SLAM agent is exploring its environment and creating a pose graph, it can only measure the relative changes in its location between sequential poses. This means that the overall uncertainty the agent has of its location accumulates over time.

If the agent is able to recognise that it has returned to a location it has seen previously, this means it can correlate two or more poses with one another. Given this new constraint, the locations of other poses in the graph can be re-evaluated to improve their accuracy.

When two existing poses are correlated with one another, this equates to adding an edge between them in the graph, forming a loop (hence loop closure). This new graph edge describes the relative transformation that the SLAM agent estimates between the two poses.

To make use of the new constraint introduced by a loop closure, the SLAM agent will run a pose graph optimisation. This process updates all poses in the graph, taking into account all connections between them.

It is important to note that the SLAM agent should be confident that it has truly returned to an area it has already seen when deciding to invoke a loop closure. A mistaken loop closure is very detrimental to the overall accuracy of a map, and cannot be easily removed from the graph.This page gives a short video overview of the SLAM process, and what happens when pose graph optimisation is performed.

Map

A spatial model of an environment. A map usually contains information about which areas of the environment are empty (traversable), and which are occupied (not traversable).

Mapping

The process of exploring an environment, and building up a map of that environment.

Noise Model

A mathematical model that describes the nature of the uncertainty in a given set of data readings.

It is straightforward to appreciate that there will be a certain amount of inaccuracy (ie. noise) in the data provided by a sensor. Knowing the shape of the distribution of this noise helps to better estimate the uncertainty around a given measurement, and better understand where the true value might lie.

An example might be the knowledge that the uncertainty in a robot’s rotation scales linearly with the amount of time that the robot spends rotating.

For the purposes of simplification, the distribution of noise is often assumed to be Gaussian in some way.

Odometry

The process of estimating a position based on sensor data. Slamcore often uses the following methods of odometry:

  • Wheel odometry: Estimating the distance travelled by counting the number of rotations of a robot’s wheels.
  • Visual odometry: Estimating motion by tracking features across multiple images over time.
  • Visual-inertial odometry: Estimating motion by combining visual feature tracking with IMU measurements of acceleration and orientation. This is more robust than visual odometry on its own, as the different types of measurements can be used to support one another and reduce overall uncertainty.

Offline (Processing)

Computation that is performed asynchronously and independently, usually because it is too intensive to be completed within a short period of time, or because it must be executed in a batch rather than iteratively. Is complemented by online processing.

Online (Processing)

Computation that is performed live as part of a step of an algorithm. The computation is usually lightweight, and performed iteratively over consecutive steps. Is complemented by offline processing.

Partial Derivative

The derivative of a function with respect to one of its input variables, where the other input variables are assumed not to change.

This is only meaningful for a function which takes multiple input variables. If your function just takes one variable, there is only one way to differentiate it.

On the other hand, if your function takes multiple variables, you can explore the derivative of the function in different ways, depending on the cases you are interested in:

  • Differentiating with respect to one variable, and restricting your interest to the cases where the other input variables do not change. This is a partial derivative.
  • Differentiating with respect to one variable, and considering what happens when other variables are allowed to change. This is a total derivative.

It is generally easier to compute a partial derivative than a total derivative, because a total derivative must take into account more input influences.

Pose

A position and orientation snapshot of an object. In SLAM, poses describe where the agent was in its environment at a particular instant in time, and are in relation to the agent’s map.

Pose Graph

A graph used in SLAM, where each node represents a pose of the SLAM agent in the environment, and each edge represents an estimated translation + rotation between the two connected poses.

The main useful property of a pose graph is that it relates sequential poses by their relative transformations. A SLAM agent has no way of knowing for certain what its real-world position is, but it can measure its relative translation and rotation from one point to the next with reasonable confidence (or at least with a known model of the expected error). A pose graph encodes this information in a way that allows the poses to be revisited and refined over time.

It should be noted that an edge between two poses also implies that the space in the map between those two poses is not occupied, since the SLAM agent was able to move between the poses.

The process of refining the overall pose graph given new constraints is called pose graph optimisation. This is performed when the SLAM system detects a loop closure.

Pose Graph Optimisation (PGO)

The process of updating a SLAM pose graph to improve the pose estimates.

In the pose graph, each edge represents a relative change in location and orientation between two poses. This transformation has a certain level of uncertainty associated with it, which determines how “flexible” the edge in the graph can be. If an edge has low uncertainty, the SLAM agent was very confident in its movement, and so the transformation should be considered reliable; if the edge has a greater uncertainty, there is a higher variability in the movement that the SLAM agent could have taken, and so the transformation should not be relied upon as strongly.

Pose graph optimisation revisits and revises the poses in the graph, and the transformations between them, based on the new pose-to-pose relationship introduced by a loop closure. The optimisation performed is to minimise the “tension” in the graph, where edges with a lower uncertainty introduce more tension (ie. a greater cost to change the edge) than edges with a higher uncertainty. Put another way, if a transformation between two poses has a higher uncertainty, it is permissible to modify the transformation more drastically than one with a lower uncertainty.

As a result of pose graph optimisation, the SLAM agent’s map of its environment should become more accurate, as the agent’s estimates of its position during the mapping should have been improved. In turn, this also improves subsequent localisation of the agent.

Real-Time Locating System (RTLS)

A system which performs live tracking of objects. SLAM is one possible method of performing this kind of real-time location, where the object being tracked is the SLAM agent.

Simultaneous Localisation And Mapping (SLAM)

A process where an agent builds up a map of its surrounding environment, while at the same time estimating its location in that environment. This is a continuous process, where both the map and the agent’s location are refined (and hopefully become more accurate) over time.

SLAM as a term describes the general problem of localisation and mapping. Different SLAM systems may approach solving this problem in different ways, so may or may not be easily comparable to the SLAM implementation at Slamcore.

Trajectory

A path that an object with mass follows through space, over time.

In the context of SLAM, a trajectory refers to the path taken by the SLAM agent through its environment. Given the same sensory inputs, modifying the parameters of the SLAM process (eg. confidence threshold for loop closures, prior knowledge about environmental features, etc.) can produce differences in the trajectories that are computed.Slamcore-Specific Terms.